Hayden Miedema & Gabriel Daenzer

**HW 11 - Cache**

**6. (Patterson and Hennessy 7.29) Suppose a byte-addressable computer's address size is k bits, the cache size is S bytes, the block size is B bytes, and the cache is A-way set associative. Give the formulas for the following in terms of S, B, A, and k:**

**Offset = log(b)**

**Index = S / B (number of blocks)**

* **Number of lines = (S/B)/A**
* **Number of index bits = log(S / (B\*A))**
* **Number of tag bits = k - (index + offset)**
* **Number of bits for cache = (S \* 8) + (index \* (tag bits + 1 + (A-1))**

**7. Consider two CPUs:**

* **P1 has a 1KB L1 cache with an 11.4% miss rate and a .62ns hit time.**
* **P2 has a 2KB L1 cache with an 8% miss rate and a .66ns hit time.**

**For simplicity, assume that instruction fetches don't cause cache misses. If 36% of**

**all instructions are memory accesses (e.g., lw and sw), the memory access time is**

**70ns, and the CPUs clock period is set to the cache hit time, then**

1. **What is the CPI for each machine?**

P1: (70/ 0.62) = 112.903 cycles

P2: (70/ 0.66) = 106.06 cycles

P1 AMAT= 113.903(0.114) + 1 (0.886) = 13.871

P2 AMAT= 107.06(0.08) + 1 (0.92) = 9.48

P1 CPI = 1 + (0.36)\* (13.871) = 5.993

P2 CPI = 1 + (0.36)\*(9.48) = 4.4128

1. **Which machine is faster? (Remember, they have different clock periods.)**

P1 time = 5.993 \* 0.62 = 3.72

P2 time = 4.4128 \* 0.66 = 2.91 ← faster

1. **Suppose a 4KB L1 cache would have a .7ns hit time. What miss rate would be necessary for this CPU to out-perform the other two?**

P3 : (70/.70) = 100

P3 AMAT = 101(x) + 1(1-x)

P3 CPI = 1 + (0.36)(101(x) + 1(1-x))

4.4128 \* 0.66 < 1 + (0.36)(101(x) + 1(1-x)) \* (0.70)

X < .06589 or 6.589%

1. **What speedup would result from doubling the speed of P1's L1 cache and clock frequency while leaving the main memory speed at 70ns? (In other words, what happens if the clock period is now .31ns, a cache access is still 1 cycle, and a cache miss is still 70ns?)**

P1: (70/ 0.62) = 112.903 cycles

P1 new: (70/ 0.31) = 225.806 cycles

P1 AMAT= 113.903(0.114) + 1 (0.886) = 13.871

P1 new AMAT= 226.806(0.114) + 1 (0.886) = 26.74

P1 CPI = 1 + (0.36)\* (13.871) = 5.993

P1 new CPI = 1 + (0.36)\*(26.74) = 10.6264

P1 time = 5.993 \* 0.62 = 3.72

P1 New time = 10.6264 \* 0.31 = 3.2942

Speedup = 3.72 / 3.2942 = 1.12925

So about a 12.92% speedup

1. **What speedup would result from doubling P1's main memory speed (to 35ns)?**

P1: (70/ 0.62) = 112.903 cycles

P1 new: (35/ 0.62) = 56.45 cycles

P1 AMAT= 113.903(0.114) + 1 (0.886) = 13.871

P1 new AMAT=57.45(0.114) + 1 (0.886) = 7.4353

P1 CPI = 1 + (0.36)\* (13.871) = 5.993

P1 new CPI = 1 + (0.36)\* (7.4353) = 3.6767

P1 time = 5.993 \* 0.62 = 3.72

P1 new time = 3.6767 \* 0.62 = 2.28

Speedup = 3.72 / 2.28 = 1.632

So there is a speed up of about 63.2%

**8. Consider a CPU with a split L1 cache and a unified L2 cache.**

* **90% of instruction access hit in the L1 instruction cache.**
* **98% of the L1 instruction cache misses hit in the L2 cache.**
* **85% of data accesses hit in the L1 data cache.**
* **93% of L1 data cache misses hit in the L2 cache.**
* **35% of instructions make a data access.**
* **The L1 caches have a 1 cycle access time.**
* **The L2 cache has a 6 cycle access time.**
* **Main Memory has a 90 cycle access time.**

**Compute the following:**

1. **What is the average memory access time for instruction accesses?**

Instruction AMAT = 1(0.90) + 7(0.10)(0.98) + 97(0.10)(0.02)

= 1.78

1. **What is the average memory access time for data accesses?**

Data AMAT = 1(0.85) + 7(0.15)(0.93) + 97(0.15)(0.07)

= 2.845

1. **We could double the size of each L1 cache if we made the clock 20% slower. By what percent would the L1 hit rates need to improve in order for the slower CPU to have better performance? (Assume that the L2 miss rate does not change, but that the L2 and memory access *cycle* times reflect the slower clock period.)**

CPU1 CPI = 1 + 0.78(1.00) + 1.845(.35)

= 2.42575

CPU2 Instruction AMAT = 1(x) + 7(1 - x)(0.98) + 97(1 - x)(0.02)

CPU2 Data AMAT = 1(y) + 7(1 - y)(0.93) + 97(1 - y)(0.07)

CPU2 CPI

= 1

+ ((x + 7(1 - x)(0.98) + 97(1 - x)(0.02)) - 1)

+ (0.35)((y + 7(1 - y)(0.93) + 97(1 - y)(0.07)) - 1)

Compare CPU1 & CPU2 CPIs

2.42575 > 1.20 (1

+ ((x + 7(1 - x)(0.98) + 97(1 - x)(0.02)) - 1)

+ (0.35)((y + 7(1 - y)(0.93) + 97(1 - y)(0.07)) - 1)

)

Wolfram

optimize(x,y) = 2.42575 > 1.2 (8.8 (1 - x) + x + 0.35 (-1 + 13.3 (1 - y) + y))

x > .869044 if y = 1

y > .762727 if x = 1

So with the CPU slowdown we can still achieve a faster CPU with the optimize formula above. However we know that if the L1 miss rate for instruction memory is below about 86.9044% or the L1 miss rate for data memory is below about 76.2727%, the new CPU is guaranteed to be slower.